Goto

Collaborating Authors

 Myrtle Beach


Training the next generation of physicians for artificial intelligence-assisted clinical neuroradiology: ASNR MICCAI Brain Tumor Segmentation (BraTS) 2025 Lighthouse Challenge education platform

Amiruddin, Raisa, Yordanov, Nikolay Y., Maleki, Nazanin, Fehringer, Pascal, Gkampenis, Athanasios, Janas, Anastasia, Krantchev, Kiril, Moawad, Ahmed, Umeh, Fabian, Abosabie, Salma, Abosabie, Sara, Alotaibi, Albara, Ghonim, Mohamed, Ghonim, Mohanad, Mhana, Sedra Abou Ali, Page, Nathan, Jakovljevic, Marko, Sharifi, Yasaman, Bhatia, Prisha, Manteghinejad, Amirreza, Guelen, Melisa, Veronesi, Michael, Hill, Virginia, So, Tiffany, Krycia, Mark, Petrovic, Bojan, Memon, Fatima, Cramer, Justin, Schrickel, Elizabeth, Kosovic, Vilma, Vidal, Lorenna, Thompson, Gerard, Ikuta, Ichiro, Albalooshy, Basimah, Nabavizadeh, Ali, Tahon, Nourel Hoda, Shekdar, Karuna, Bhatia, Aashim, Kirsch, Claudia, D'Anna, Gennaro, Lohmann, Philipp, Nour, Amal Saleh, Myronenko, Andriy, Goldman-Yassen, Adam, Reid, Janet R., Aneja, Sanjay, Bakas, Spyridon, Aboian, Mariam

arXiv.org Artificial Intelligence

High-quality reference standard image data creation by neuroradiology experts for automated clinical tools can be a powerful tool for neuroradiology & artificial intelligence education. We developed a multimodal educational approach for students and trainees during the MICCAI Brain Tumor Segmentation Lighthouse Challenge 2025, a landmark initiative to develop accurate brain tumor segmentation algorithms. Fifty-six medical students & radiology trainees volunteered to annotate brain tumor MR images for the BraTS challenges of 2023 & 2024, guided by faculty-led didactics on neuropathology MRI. Among the 56 annotators, 14 select volunteers were then paired with neuroradiology faculty for guided one-on-one annotation sessions for BraTS 2025. Lectures on neuroanatomy, pathology & AI, journal clubs & data scientist-led workshops were organized online. Annotators & audience members completed surveys on their perceived knowledge before & after annotations & lectures respectively. Fourteen coordinators, each paired with a neuroradiologist, completed the data annotation process, averaging 1322.9+/-760.7 hours per dataset per pair and 1200 segmentations in total. On a scale of 1-10, annotation coordinators reported significant increase in familiarity with image segmentation software pre- and post-annotation, moving from initial average of 6+/-2.9 to final average of 8.9+/-1.1, and significant increase in familiarity with brain tumor features pre- and post-annotation, moving from initial average of 6.2+/-2.4 to final average of 8.1+/-1.2. We demonstrate an innovative offering for providing neuroradiology & AI education through an image segmentation challenge to enhance understanding of algorithm development, reinforce the concept of data reference standard, and diversify opportunities for AI-driven image analysis among future physicians.


Agent Context Protocols Enhance Collective Inference

Bhardwaj, Devansh, Beniwal, Arjun, Chaudhari, Shreyas, Kalyan, Ashwin, Rajpurohit, Tanmay, Narasimhan, Karthik R., Deshpande, Ameet, Murahari, Vishvak

arXiv.org Artificial Intelligence

AI agents have become increasingly adept at complex tasks such as coding, reasoning, and multimodal understanding. However, building generalist systems requires moving beyond individual agents to collective inference -- a paradigm where multi-agent systems with diverse, task-specialized agents complement one another through structured communication and collaboration. Today, coordination is usually handled with imprecise, ad-hoc natural language, which limits complex interaction and hinders interoperability with domain-specific agents. We introduce Agent context protocols (ACPs): a domain- and agent-agnostic family of structured protocols for agent-agent communication, coordination, and error handling. ACPs combine (i) persistent execution blueprints -- explicit dependency graphs that store intermediate agent outputs -- with (ii) standardized message schemas, enabling robust and fault-tolerant multi-agent collective inference. ACP-powered generalist systems reach state-of-the-art performance: 28.3 % accuracy on AssistantBench for long-horizon web assistance and best-in-class multimodal technical reports, outperforming commercial AI systems in human evaluation. ACPs are highly modular and extensible, allowing practitioners to build top-tier generalist agents quickly.


Do It For Me vs. Do It With Me: Investigating User Perceptions of Different Paradigms of Automation in Copilots for Feature-Rich Software

Khurana, Anjali, Su, Xiaotian, Wang, April Yi, Chilana, Parmit K

arXiv.org Artificial Intelligence

Large Language Model (LLM)-based in-application assistants, or copilots, can automate software tasks, but users often prefer learning by doing, raising questions about the optimal level of automation for an effective user experience. We investigated two automation paradigms by designing and implementing a fully automated copilot (AutoCopilot) and a semi-automated copilot (GuidedCopilot) that automates trivial steps while offering step-by-step visual guidance. In a user study (N=20) across data analysis and visual design tasks, GuidedCopilot outperformed AutoCopilot in user control, software utility, and learnability, especially for exploratory and creative tasks, while AutoCopilot saved time for simpler visual tasks. A follow-up design exploration (N=10) enhanced GuidedCopilot with task-and state-aware features, including in-context preview clips and adaptive instructions. Our findings highlight the critical role of user control and tailored guidance in designing the next generation of copilots that enhance productivity, support diverse skill levels, and foster deeper software engagement.


Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking

Khurana, Anjali, Subramonyam, Hari, Chilana, Parmit K

arXiv.org Artificial Intelligence

Large Language Model (LLM) assistants, such as ChatGPT, have emerged as potential alternatives to search methods for helping users navigate complex, feature-rich software. LLMs use vast training data from domain-specific texts, software manuals, and code repositories to mimic human-like interactions, offering tailored assistance, including step-by-step instructions. In this work, we investigated LLM-generated software guidance through a within-subject experiment with 16 participants and follow-up interviews. We compared a baseline LLM assistant with an LLM optimized for particular software contexts, SoftAIBot, which also offered guidelines for constructing appropriate prompts. We assessed task completion, perceived accuracy, relevance, and trust. Surprisingly, although SoftAIBot outperformed the baseline LLM, our results revealed no significant difference in LLM usage and user perceptions with or without prompt guidelines and the integration of domain context. Most users struggled to understand how the prompt's text related to the LLM's responses and often followed the LLM's suggestions verbatim, even if they were incorrect. This resulted in difficulties when using the LLM's advice for software tasks, leading to low task completion rates. Our detailed analysis also revealed that users remained unaware of inaccuracies in the LLM's responses, indicating a gap between their lack of software expertise and their ability to evaluate the LLM's assistance. With the growing push for designing domain-specific LLM assistants, we emphasize the importance of incorporating explainable, context-aware cues into LLMs to help users understand prompt-based interactions, identify biases, and maximize the utility of LLM assistants.


Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

Wu, Skyler, Shen, Eric Meng, Badrinath, Charumathi, Ma, Jiaqi, Lakkaraju, Himabindu

arXiv.org Artificial Intelligence

Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. We address this question by leveraging gradient-based feature attribution methods which produce saliency scores that capture the influence of input tokens on model output. Specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output.


Large Language Models Based Automatic Synthesis of Software Specifications

Mandal, Shantanu, Chethan, Adhrik, Janfaza, Vahid, Mahmud, S M Farabi, Anderson, Todd A, Turek, Javier, Tithi, Jesmin Jahan, Muzahid, Abdullah

arXiv.org Artificial Intelligence

Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary to identify the correct configuration, along with their valid bounds and rules, which are commonly referred to as software specifications. As software systems grow in complexity and scale, the number of configurations and associated specifications required to ensure the correct operation can become large and prohibitively difficult to manipulate manually. Due to the fast pace of software development, it is often the case that correct software specifications are not thoroughly checked or validated within the software itself. Rather, they are frequently discussed and documented in a variety of external sources, including software manuals, code comments, and online discussion forums. Therefore, it is hard for the system administrator to know the correct specifications of configurations due to the lack of clarity, organization, and a centralized unified source to look at. To address this challenge, we propose SpecSyn a framework that leverages a state-of-the-art large language model to automatically synthesize software specifications from natural language sources. Our approach formulates software specification synthesis as a sequence-to-sequence learning problem and investigates the extraction of specifications from large contextual texts. This is the first work that uses a large language model for end-to-end specification synthesis from natural language texts. Empirical results demonstrate that our system outperforms prior the state-of-the-art specification synthesis tool by 21% in terms of F1 score and can find specifications from single as well as multiple sentences.


Can AI answer your money questions? We put chatbots to the test

#artificialintelligence

NEW YORK, April 13 (Reuters) - Face it, we could all use a little help with our money. So who better to ask for personal finance advice than a couple of the most powerful chatbots on the planet? Both OpenAI's ChatGPT and Google's Bard are dominating headlines recently, for their generative capabilities and vast storehouses of information. Each has far more processing power than, say, any individual personal finance writer (ahem). What is one great business idea?


How AI Could Change the Highly-Skilled Job Market

#artificialintelligence

When most people think of the connection between technology and jobs, they think of robots and automation taking over relatively unskilled jobs like factory work. And thus, the biggest toll from these technological advances would be on already hard-hit manufacturing regions of the Rust Belt. But a new wave of developments in artificial intelligence may have a greater effect on high-skilled jobs and high-tech knowledge regions. The study by Mark Muro, Jacob Whiton, and Robert Maxim takes a close look at the potential of artificial intelligence--or AI--to automate tasks that until now have required human intelligence and decision-making. As they put it: "Unlike robotics (associated with the factory floor) and computers (associated with routine office activities), AI has a distinctly white-collar bent."


Automation and AI sound similar, but may have vastly different impacts on the future of work

#artificialintelligence

Last November, Brookings published a report on artificial intelligence's impact on the workplace that immediately raised eyebrows. Many readers, journalists, and even experts were perplexed by the report's primary finding: that, for the most part, it is better-paid, better-educated white-collar workers who are most exposed to AI's potential economic disruption. This conclusion--by authors Mark Muro, Robert Maxim, and Jacob Whiton--seemed to fly in the face of the popular understanding of technology's future effects on workers. For years, we've been hearing about how these advancements will force mainly blue-collar, lower-income workers out of jobs, as robotics and technology slowly consume those industries. In an article about the November report, The Mercury News outlined this discrepancy: "The study released Wednesday by the Brookings Institution seems to contradict findings from previous studies--including Brookings' own--that showed lower-skilled workers will be most affected by robots and automation, which can involve AI."


Feature and TV films

Los Angeles Times

Mr. Smith Goes to Washington 1939 TCM Tue. 7 p.m. Mean Streets 1973 Cinemax Sun. 6 a.m. Batman Begins 2005 AMC Sun. Throw Momma From the Train 1987 EPIX Sun. Die Hard 1988 IFC Sun. I Know What You Did Last Summer 1997 Starz Tue. Gone in 60 Seconds 2000 CMT Wed. 8 p.m., Thur. Total Recall 1990 Encore Thur. 2 a.m. A Fish Called Wanda 1988 Encore Thur. 2 p.m., 9 p.m. The World Is Not Enough 1999 EPIX Sat. 4 p.m. Look Who's Talking 1989 OVA Sun. Die Hard With a Vengeance 1995 IFC Thur. Oil-platform workers, including an estranged couple, and a Navy SEAL make a startling deep-sea discovery. A clueless politician falls in love with a waitress whose erratic behavior is caused by a nail stuck in her head. After glimpsing his future, an ambitious politician battles the agents of Fate itself to be with the woman he loves. To help a friend, a suburban baby sitter drives into downtown Chicago with her two charges and a neighbor. Two teenage baby sitters and a group of children spend a wild night ...